class_likelihood_ratios (LR+ / LR-)#

Compute the positive and negative likelihood ratios for a binary classifier.

In scikit-learn this is sklearn.metrics.class_likelihood_ratios.

Learning goals#

  • Derive (LR_+) and (LR_-) from the confusion matrix

  • Interpret them as odds multipliers (pre-test (\to) post-test probabilities)

  • Implement the metric from scratch in NumPy (weights + label ordering)

  • Visualize how likelihood ratios change with the decision threshold

  • Use likelihood ratios to pick an operating point (screening vs confirmation)

Prerequisites#

  • Confusion matrix, sensitivity/specificity

  • Basic Bayes rule / odds

  • Logistic regression + ROC curves (helpful, but not required)

import warnings

import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

from sklearn.datasets import make_classification
from sklearn.metrics import class_likelihood_ratios, roc_curve
from sklearn.model_selection import train_test_split

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)

1) Definition: likelihood ratios as conditional probability ratios#

Treat a classifier’s prediction as a diagnostic test:

  • test positive (\iff) predict the positive class

  • test negative (\iff) predict the negative class

The likelihood ratios compare how often the test is positive/negative under each true class:

[ LR_+ = \frac{P(\hat{y}=1 \mid y=1)}{P(\hat{y}=1 \mid y=0)} \qquad LR_- = \frac{P(\hat{y}=0 \mid y=1)}{P(\hat{y}=0 \mid y=0)}. ]

Why this is useful: in odds form Bayes rule becomes a multiplication.

Define odds for a probability (p):

[ \operatorname{odds}(p) = \frac{p}{1-p}. ]

Then the update is:

[ \operatorname{odds}(y=1 \mid \text{test}+) = \operatorname{odds}(y=1)\cdot LR_+, ]

[ \operatorname{odds}(y=1 \mid \text{test}-) = \operatorname{odds}(y=1)\cdot LR_-. ]

Converting odds back to probability:

[ p = \frac{\operatorname{odds}}{1 + \operatorname{odds}}. ]

Equivalently in log-odds:

[ \operatorname{logit}(p_{post}) = \operatorname{logit}(p_{pre}) + \log(LR). ]

Key point: (LR_+) and (LR_-) are functions of sensitivity and specificity (not prevalence), but turning them into post-test probabilities requires a prior (pre-test probability).

2) From confusion matrix to (LR_+) and (LR_-)#

For a binary classifier with positive class (y=1) and negative class (y=0):

[ \begin{array}{c|cc} & \hat{y}=0 & \hat{y}=1\hline y=0 & TN & FP\ y=1 & FN & TP \end{array} ]

Define:

  • Sensitivity / recall / true positive rate (TPR)

    [ \text{TPR} = \frac{TP}{TP+FN} ]

  • Specificity / true negative rate (TNR)

    [ \text{TNR} = \frac{TN}{TN+FP} ]

  • False positive rate (FPR): (\text{FPR} = 1-\text{TNR} = \frac{FP}{TN+FP})

  • False negative rate (FNR): (\text{FNR} = 1-\text{TPR} = \frac{FN}{TP+FN})

Then:

[ LR_+ = \frac{\text{TPR}}{\text{FPR}} = \frac{\text{sensitivity}}{1-\text{specificity}} ]

[ LR_- = \frac{\text{FNR}}{\text{TNR}} = \frac{1-\text{sensitivity}}{\text{specificity}}. ]

A commonly used single-number summary is the diagnostic odds ratio:

[ \text{DOR} = \frac{LR_+}{LR_-} = \frac{TP\cdot TN}{FP\cdot FN}, ]

but note it can be undefined/infinite when (FP=0) or (FN=0).

def _infer_binary_labels(y_true, y_pred, labels=None):
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)

    if labels is None:
        labels = np.unique(np.concatenate([np.unique(y_true), np.unique(y_pred)]))
        if labels.shape[0] != 2:
            raise ValueError(f"Expected 2 labels for binary classification, got {labels!r}")
        labels = np.sort(labels)  # sklearn default
    else:
        labels = np.asarray(labels)
        if labels.shape[0] != 2:
            raise ValueError("labels must be of length 2: [negative_class, positive_class]")

    neg_label, pos_label = labels[0], labels[1]
    return neg_label, pos_label


def confusion_counts_binary(y_true, y_pred, *, labels=None, sample_weight=None):
    '''Return (tp, fp, tn, fn) as floats.'''
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    neg_label, pos_label = _infer_binary_labels(y_true, y_pred, labels=labels)

    if sample_weight is None:
        w = np.ones_like(y_true, dtype=float)
    else:
        w = np.asarray(sample_weight, dtype=float)
        if w.shape != y_true.shape:
            raise ValueError("sample_weight must have shape (n_samples,)")

    is_pos_true = y_true == pos_label
    is_pos_pred = y_pred == pos_label

    tp = np.sum(w * (is_pos_true & is_pos_pred))
    fp = np.sum(w * (~is_pos_true & is_pos_pred))
    tn = np.sum(w * (~is_pos_true & ~is_pos_pred))
    fn = np.sum(w * (is_pos_true & ~is_pos_pred))

    return float(tp), float(fp), float(tn), float(fn)


def class_likelihood_ratios_numpy(
    y_true,
    y_pred,
    *,
    labels=None,
    sample_weight=None,
    raise_warning=True,
):
    '''NumPy implementation matching sklearn.metrics.class_likelihood_ratios.'''
    tp, fp, tn, fn = confusion_counts_binary(
        y_true, y_pred, labels=labels, sample_weight=sample_weight
    )

    pos_total = tp + fn
    neg_total = tn + fp

    if pos_total == 0 or neg_total == 0:
        if raise_warning:
            warnings.warn(
                "No positive or no negative samples in y_true; likelihood ratios are undefined.",
                UserWarning,
            )
        return (np.nan, np.nan)

    tpr = tp / pos_total
    fnr = fn / pos_total
    fpr = fp / neg_total
    tnr = tn / neg_total

    lr_plus = np.nan
    lr_minus = np.nan

    if fpr == 0:
        if raise_warning:
            warnings.warn("When false positive == 0, the positive likelihood ratio is undefined.")
    else:
        lr_plus = tpr / fpr

    if tnr == 0:
        if raise_warning:
            warnings.warn("When true negative == 0, the negative likelihood ratio is undefined.")
    else:
        lr_minus = fnr / tnr

    return (lr_plus, lr_minus)
# Quick sanity checks vs scikit-learn

y_true = [0, 1, 0, 1, 0]
y_pred = [1, 1, 0, 0, 0]

print("sklearn:", class_likelihood_ratios(y_true, y_pred))
print("numpy :", class_likelihood_ratios_numpy(y_true, y_pred, raise_warning=False))

y_true = np.array(["non-cat", "cat", "non-cat", "cat", "non-cat"])
y_pred = np.array(["cat", "cat", "non-cat", "non-cat", "non-cat"])

print()
print("Default label order (sorted):")
print("sklearn:", class_likelihood_ratios(y_true, y_pred))

print()
print("Explicit labels=[negative, positive]:")
print("sklearn:", class_likelihood_ratios(y_true, y_pred, labels=["non-cat", "cat"]))
sklearn: (1.5, 0.75)
numpy : (1.5, 0.75)

Default label order (sorted):
sklearn: (1.3333333333333333, 0.6666666666666666)

Explicit labels=[negative, positive]:
sklearn: (1.5, 0.75)

3) Interpretation and common pitfalls#

Valid ranges (for a useful classifier):

  • (LR_+ \ge 1). Values close to 1 mean “a positive prediction barely changes the odds”.

  • (0 \le LR_- \le 1). Values close to 1 mean “a negative prediction barely changes the odds”.

If you ever see (LR_+ < 1) or (LR_- > 1), the classifier is often behaving like it has the labels flipped (or your labels=[negative, positive] ordering is wrong).

Rule-of-thumb strength of evidence (very domain dependent):

Evidence

(LR_+)

(LR_-)

small

2–5

0.5–0.2

moderate

5–10

0.2–0.1

large

> 10

< 0.1

Pitfalls

  • The metric needs hard predictions (class labels). If your model outputs probabilities, you must choose a threshold first.

  • (LR_+) is undefined when (FP=0) ((\text{FPR}=0)). (LR_-) is undefined when (TN=0) ((\text{TNR}=0)). Small datasets can make this happen easily.

  • Multi-class problems need a one-vs-rest reduction; scikit-learn’s class_likelihood_ratios is binary-only.

def odds(p):
    p = np.asarray(p)
    return p / (1.0 - p)


def prob_from_odds(o):
    o = np.asarray(o)
    return o / (1.0 + o)


def update_probability(p_pre, lr):
    '''Bayes update in odds form.'''
    return prob_from_odds(odds(p_pre) * lr)


p_pre = np.linspace(0.001, 0.999, 400)

lr_plus_values = [2, 5, 10]
lr_minus_values = [0.5, 0.2, 0.1]

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=(
        "Post-test probability after a POSITIVE prediction (use LR+)",
        "Post-test probability after a NEGATIVE prediction (use LR-)",
    ),
)

for lr in lr_plus_values:
    fig.add_trace(
        go.Scatter(x=p_pre, y=update_probability(p_pre, lr), mode="lines", name=f"LR+={lr}"),
        row=1,
        col=1,
    )

for lr in lr_minus_values:
    fig.add_trace(
        go.Scatter(x=p_pre, y=update_probability(p_pre, lr), mode="lines", name=f"LR-={lr}"),
        row=1,
        col=2,
    )

# Reference line: no change
fig.add_trace(
    go.Scatter(x=p_pre, y=p_pre, mode="lines", line=dict(dash="dash"), name="no change"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=p_pre, y=p_pre, mode="lines", line=dict(dash="dash"), showlegend=False),
    row=1,
    col=2,
)

fig.update_xaxes(title_text="pre-test probability", range=[0, 1], row=1, col=1)
fig.update_xaxes(title_text="pre-test probability", range=[0, 1], row=1, col=2)
fig.update_yaxes(title_text="post-test probability", range=[0, 1], row=1, col=1)
fig.update_yaxes(title_text="post-test probability", range=[0, 1], row=1, col=2)

fig.update_layout(width=1000, height=420)
fig.show()

4) Threshold dependence and ROC geometry#

If your model outputs a score or probability (\hat{p}), you get hard predictions via a threshold (t):

[ \hat{y}(t) = \mathbb{1}[\hat{p} \ge t]. ]

So (LR_+) and (LR_-) are functions of the threshold.

On the ROC plane (x = FPR, y = TPR) for a particular threshold:

  • (LR_+ = \frac{\text{TPR}}{\text{FPR}}) is the slope of the line from ((0,0)) to the ROC point.

  • (LR_- = \frac{1-\text{TPR}}{1-\text{FPR}}) is the slope of the line from ((1,1)) to the ROC point.

This makes the metric visually interpretable: to get a large (LR_+) you want a ROC point that is steep above the origin; to get a small (LR_-) you want a point close to the top-left.

# Synthetic 2D dataset for visualization
X, y = make_classification(
    n_samples=2200,
    n_features=2,
    n_redundant=0,
    n_informative=2,
    n_clusters_per_class=1,
    class_sep=1.2,
    flip_y=0.05,
    random_state=7,
)

X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=7
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val, test_size=0.25, stratify=y_train_val, random_state=7
)

# Standardize (helps gradient descent)
mean_ = X_train.mean(axis=0)
std_ = X_train.std(axis=0)
X_train_s = (X_train - mean_) / std_
X_val_s = (X_val - mean_) / std_
X_test_s = (X_test - mean_) / std_


def sigmoid(z):
    # Stable sigmoid
    z = np.asarray(z)
    out = np.empty_like(z, dtype=float)
    pos = z >= 0
    out[pos] = 1.0 / (1.0 + np.exp(-z[pos]))
    ez = np.exp(z[~pos])
    out[~pos] = ez / (1.0 + ez)
    return out


def fit_logreg_gd(X, y, *, lr=0.15, n_steps=2500, l2=0.01, seed=7):
    rng_local = np.random.default_rng(seed)
    n, d = X.shape
    w = rng_local.normal(scale=0.1, size=d)
    b = 0.0

    eps = 1e-12
    losses = []

    for step in range(n_steps):
        z = X @ w + b
        p = sigmoid(z)

        # Binary cross-entropy + L2
        loss = -np.mean(y * np.log(p + eps) + (1 - y) * np.log(1 - p + eps)) + 0.5 * l2 * np.sum(w * w)

        # Gradients
        grad_w = (X.T @ (p - y)) / n + l2 * w
        grad_b = np.mean(p - y)

        w -= lr * grad_w
        b -= lr * grad_b

        if step % 25 == 0:
            losses.append(loss)

    return w, b, np.array(losses)


w, b, losses = fit_logreg_gd(X_train_s, y_train)

fig = go.Figure()
fig.add_trace(go.Scatter(y=losses, mode="lines", name="train loss"))
fig.update_layout(
    title="Logistic regression from scratch (gradient descent)",
    xaxis_title="checkpoint (every 25 steps)",
    yaxis_title="cross-entropy loss",
    width=900,
    height=380,
)
fig.show()

p_val = sigmoid(X_val_s @ w + b)
# Probability distributions by class (validation set)

df = {
    "p_hat": p_val,
    "y": y_val.astype(int),
}

fig = px.histogram(
    df,
    x="p_hat",
    color="y",
    nbins=50,
    opacity=0.6,
    barmode="overlay",
    histnorm="probability",
    title="Predicted probabilities by true class (validation set)",
    labels={"p_hat": "predicted P(y=1|x)", "y": "true class"},
)
fig.update_layout(width=900, height=420)
fig.show()
def sweep_thresholds(y_true, y_proba, thresholds):
    rows = []
    for t in thresholds:
        y_pred = (y_proba >= t).astype(int)
        tp, fp, tn, fn = confusion_counts_binary(y_true, y_pred, labels=[0, 1])

        pos_total = tp + fn
        neg_total = tn + fp

        tpr = tp / pos_total if pos_total > 0 else np.nan
        fnr = fn / pos_total if pos_total > 0 else np.nan
        fpr = fp / neg_total if neg_total > 0 else np.nan
        tnr = tn / neg_total if neg_total > 0 else np.nan

        lr_plus = tpr / fpr if (np.isfinite(fpr) and fpr > 0) else np.nan
        lr_minus = fnr / tnr if (np.isfinite(tnr) and tnr > 0) else np.nan

        dor = (
            lr_plus / lr_minus
            if (np.isfinite(lr_plus) and np.isfinite(lr_minus) and lr_plus > 0 and lr_minus > 0)
            else np.nan
        )

        rows.append((t, tp, fp, tn, fn, tpr, tnr, lr_plus, lr_minus, dor))

    arr = np.array(rows, dtype=float)
    return {
        "threshold": arr[:, 0],
        "tp": arr[:, 1],
        "fp": arr[:, 2],
        "tn": arr[:, 3],
        "fn": arr[:, 4],
        "tpr": arr[:, 5],
        "tnr": arr[:, 6],
        "lr_plus": arr[:, 7],
        "lr_minus": arr[:, 8],
        "dor": arr[:, 9],
    }


thresholds = np.linspace(0.01, 0.99, 99)
sweep = sweep_thresholds(y_val, p_val, thresholds)


def pick_operating_points(sweep, *, min_sensitivity=0.95, min_specificity=0.95):
    thresholds = sweep["threshold"]

    sens = sweep["tpr"]  # sensitivity
    spec = sweep["tnr"]  # specificity
    lr_plus = sweep["lr_plus"]
    lr_minus = sweep["lr_minus"]

    # A generic way to combine LR+ and LR- into one objective: diagnostic odds ratio (DOR)
    # Fallback: Youden's J = sensitivity + specificity - 1 (always defined as long as rates are defined)
    dor = sweep["dor"]
    youden_j = sens + spec - 1

    if np.any(np.isfinite(dor)):
        t_best = thresholds[np.nanargmax(dor)]
        best_label = "max DOR"
    else:
        t_best = thresholds[np.nanargmax(youden_j)]
        best_label = "max Youden J (fallback)"

    # Screening: prioritize ruling OUT => minimize LR- while keeping sensitivity high
    mask_screen = (sens >= min_sensitivity) & np.isfinite(lr_minus)
    if mask_screen.any():
        t_screen = thresholds[mask_screen][np.nanargmin(lr_minus[mask_screen])]
    else:
        t_screen = thresholds[np.nanargmin(lr_minus)]

    # Confirmation: prioritize ruling IN => maximize LR+ while keeping specificity high
    mask_confirm = (spec >= min_specificity) & np.isfinite(lr_plus)
    if mask_confirm.any():
        t_confirm = thresholds[mask_confirm][np.nanargmax(lr_plus[mask_confirm])]
    else:
        t_confirm = thresholds[np.nanargmax(lr_plus)]

    return t_best, t_screen, t_confirm, best_label


t_best, t_screen, t_confirm, best_label = pick_operating_points(sweep)

(t_best, t_screen, t_confirm, best_label)
(0.46, 0.05, 0.93, 'max DOR')
def _vline(fig, x, *, label, color):
    fig.add_vline(x=x, line_width=2, line_dash="dash", line_color=color)
    fig.add_annotation(
        x=x,
        y=1.02,
        xref="x",
        yref="paper",
        text=label,
        showarrow=False,
        font=dict(color=color),
    )


fig = make_subplots(rows=1, cols=2, subplot_titles=("LR+ vs threshold", "LR- vs threshold"))

fig.add_trace(
    go.Scatter(x=sweep["threshold"], y=sweep["lr_plus"], mode="lines", name="LR+"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=sweep["threshold"], y=sweep["lr_minus"], mode="lines", name="LR-"),
    row=1,
    col=2,
)

for x, label, color in [
    (t_best, best_label, "#1f77b4"),
    (t_screen, "screening", "#2ca02c"),
    (t_confirm, "confirm", "#d62728"),
]:
    _vline(fig, x, label=label, color=color)

fig.update_yaxes(type="log", row=1, col=1)
fig.update_yaxes(type="log", row=1, col=2)
fig.update_xaxes(title_text="threshold t", row=1, col=1)
fig.update_xaxes(title_text="threshold t", row=1, col=2)
fig.update_yaxes(title_text="LR+ (log scale)", row=1, col=1)
fig.update_yaxes(title_text="LR- (log scale)", row=1, col=2)

fig.update_layout(width=1000, height=420)
fig.show()
# ROC curve (validation set) + geometric interpretation of LR

fpr, tpr, thr = roc_curve(y_val, p_val)

fig = go.Figure()
fig.add_trace(go.Scatter(x=fpr, y=tpr, mode="lines", name="ROC"))
fig.add_trace(
    go.Scatter(x=[0, 1], y=[0, 1], mode="lines", line=dict(dash="dash"), name="random")
)

# Get the ROC point closest to our chosen threshold t_best
# (roc_curve returns thresholds in decreasing order)
idx = np.argmin(np.abs(thr - t_best))
x_pt, y_pt = fpr[idx], tpr[idx]

# LR slopes at that operating point
lr_plus = y_pt / x_pt if x_pt > 0 else np.inf
lr_minus = (1 - y_pt) / (1 - x_pt) if (1 - x_pt) > 0 else np.inf

fig.add_trace(
    go.Scatter(
        x=[x_pt],
        y=[y_pt],
        mode="markers",
        marker=dict(size=10, color="#1f77b4"),
        name=f"t≈{t_best:.2f}",
    )
)

# Lines showing the slopes
fig.add_trace(
    go.Scatter(x=[0, x_pt], y=[0, y_pt], mode="lines", line=dict(color="#1f77b4"), showlegend=False)
)
fig.add_trace(
    go.Scatter(x=[1, x_pt], y=[1, y_pt], mode="lines", line=dict(color="#d62728"), showlegend=False)
)

fig.update_layout(
    title=f"ROC geometry at t≈{t_best:.2f}:  LR+≈{lr_plus:.2f},  LR-≈{lr_minus:.2f}",
    xaxis_title="FPR",
    yaxis_title="TPR",
    width=900,
    height=500,
    xaxis=dict(range=[0, 1]),
    yaxis=dict(range=[0, 1]),
)
fig.show()
def metrics_at_threshold(y_true, y_proba, t):
    y_pred = (y_proba >= t).astype(int)
    lr_p, lr_m = class_likelihood_ratios_numpy(y_true, y_pred, labels=[0, 1], raise_warning=False)
    tp, fp, tn, fn = confusion_counts_binary(y_true, y_pred, labels=[0, 1])
    tpr = tp / (tp + fn)
    tnr = tn / (tn + fp)
    return {
        "t": t,
        "tp": tp,
        "fp": fp,
        "tn": tn,
        "fn": fn,
        "tpr": tpr,
        "tnr": tnr,
        "lr_plus": lr_p,
        "lr_minus": lr_m,
    }


m_best = metrics_at_threshold(y_val, p_val, t_best)
m_screen = metrics_at_threshold(y_val, p_val, t_screen)
m_confirm = metrics_at_threshold(y_val, p_val, t_confirm)

m_best, m_screen, m_confirm
({'t': 0.46,
  'tp': 203.0,
  'fp': 9.0,
  'tn': 212.0,
  'fn': 16.0,
  'tpr': 0.9269406392694064,
  'tnr': 0.9592760180995475,
  'lr_plus': 22.76154236428209,
  'lr_minus': 0.07616093736538296},
 {'t': 0.05,
  'tp': 218.0,
  'fp': 145.0,
  'tn': 76.0,
  'fn': 1.0,
  'tpr': 0.9954337899543378,
  'tnr': 0.3438914027149321,
  'lr_plus': 1.5171783971028183,
  'lr_minus': 0.01327805815909637},
 {'t': 0.93,
  'tp': 115.0,
  'fp': 2.0,
  'tn': 219.0,
  'fn': 104.0,
  'tpr': 0.5251141552511416,
  'tnr': 0.9909502262443439,
  'lr_plus': 58.02511415525114,
  'lr_minus': 0.4792227017785283})
fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=(
        f"Screening (t={t_screen:.2f})",
        f"Max DOR (t={t_best:.2f})",
        f"Confirm (t={t_confirm:.2f})",
    ),
)

for j, m in enumerate([m_screen, m_best, m_confirm], start=1):
    cm = np.array([[m["tn"], m["fp"]], [m["fn"], m["tp"]]], dtype=float)
    fig.add_trace(
        go.Heatmap(
            z=cm,
            x=["pred 0", "pred 1"],
            y=["true 0", "true 1"],
            colorscale="Blues",
            showscale=False,
            text=cm.astype(int),
            texttemplate="%{text}",
            textfont=dict(size=16),
        ),
        row=1,
        col=j,
    )

fig.update_layout(
    width=1050,
    height=420,
    title="Confusion matrices at three operating points (validation set)",
)
fig.show()
# How the chosen operating point changes post-test probability

p_pre = 0.10  # example prior/prevalence

for name, m in [("screening", m_screen), ("max_dor", m_best), ("confirm", m_confirm)]:
    p_pos = update_probability(p_pre, m["lr_plus"])   # after a positive prediction
    p_neg = update_probability(p_pre, m["lr_minus"])  # after a negative prediction
    print(
        f"{name:9s}  t={m['t']:.2f}  LR+={m['lr_plus']:.2f}  LR-={m['lr_minus']:.2f}  "
        f"p(y=1|+)= {p_pos:.3f}  p(y=1|-)= {p_neg:.3f}"
    )
screening  t=0.05  LR+=1.52  LR-=0.01  p(y=1|+)= 0.144  p(y=1|-)= 0.001
max_dor    t=0.46  LR+=22.76  LR-=0.08  p(y=1|+)= 0.717  p(y=1|-)= 0.008
confirm    t=0.93  LR+=58.03  LR-=0.48  p(y=1|+)= 0.866  p(y=1|-)= 0.051

5) Using likelihood ratios to optimize a simple algorithm#

(LR_+) and (LR_-) are defined through counts (TP/FP/TN/FN), so they are not differentiable w.r.t. model parameters.

A common workflow is therefore:

  1. Train a probabilistic model (e.g. logistic regression) using a differentiable loss (cross-entropy)

  2. Use likelihood ratios on a validation set to pick an operating point (decision threshold)

Example strategies:

  • Screening test (rule out): pick a threshold with high sensitivity and minimal (LR_-)

  • Confirmatory test (rule in): pick a threshold with high specificity and maximal (LR_+)

  • Single-number optimization: maximize DOR = (LR_+/LR_-) (useful, but can be unstable if FP or FN are small)

# Final check on a held-out test set using the max-DOR threshold from validation

p_test = sigmoid(X_test_s @ w + b)
y_pred_test = (p_test >= t_best).astype(int)

print("Test set LR (sklearn):", class_likelihood_ratios(y_test, y_pred_test, labels=[0, 1]))
print("Test set LR (numpy) :", class_likelihood_ratios_numpy(y_test, y_pred_test, labels=[0, 1], raise_warning=False))
Test set LR (sklearn): (11.044393708777271, 0.10936410464043908)
Test set LR (numpy) : (11.044393708777271, 0.10936410464043908)

Pros / cons / when to use#

Pros

  • Interpretable: directly tells you how to update odds (prevalence + test result (\to) posterior)

  • Uses sensitivity/specificity, so it is more stable across different prevalences than precision/NPV

  • Naturally supports “rule-in” (large (LR_+)) vs “rule-out” (small (LR_-)) thinking

Cons

  • Threshold-dependent and based on hard predictions (not a ranking metric like AUC)

  • Can be undefined/infinite when (FP=0) or (TN=0), especially on small datasets

  • Binary-only; multi-class needs one-vs-rest and careful reporting

Good fits

  • Medical diagnostic tests, screening vs confirmation

  • Any binary decision where base rate/prevalence is known or can be estimated and you need a domain-friendly “odds update” explanation

Exercises#

  1. On a dataset you care about, sweep thresholds and compare:

    • max (LR_+) at specificity (\ge 0.95)

    • min (LR_-) at sensitivity (\ge 0.95)

    • max DOR Do these thresholds match what you would pick using accuracy or F1?

  2. Implement one-vs-rest likelihood ratios for multi-class classification and report the per-class (LR_+) and (LR_-).

References#

  • scikit-learn: sklearn.metrics.class_likelihood_ratios

  • Wikipedia: https://en.wikipedia.org/wiki/Likelihood_ratios_in_diagnostic_testing